Shapely DNA attracts the right partner.

نویسندگان

  • Teresa M Przytycka
  • David Levens
چکیده

All levels of cell activity are coordinated directly or indirectly by transcription factors (TFs). In turn, the functioning of TFs relies on their ability to recognize and bind specific DNA sequences to regulate the expression of specific genes. How exactly this specificity is achieved is still not fully understood. Although often supposed to execute a binary decision to bind or not bind at a target sequence, in much the same way that a restriction enzyme cuts or not at its restriction site, most TFs, rather than binding to a unique sequence, in reality bind with various affinities to a range of related sequences. This molecular recognition is achieved through complementary interactions between protein and DNA surfaces and their functional groups. These interactions must provide enough information both to define the binding site sequence and to discriminate authentic binding sites from a cloud of related sites that might be made accessible by thermal fluctuations (1). To capture these interactions, genome-wide prediction of TF-binding sites and their affinities (or, ideally, binding free energies) rely chiefly on quantitative models based on experimentally/empirically determined or computationally predicted binding sites. Many of these models are mechanistically agnostic, simply exploiting the statistical enrichment of sequences recovered from in vivo or in vitro binding experiments irrespective of the detailed chemistry and physics of site recognition. These quantitative models of TF binding can also be used for predicting disease-causing mutations. The simplest model of TF binding assumes that the preference for any nucleotide within a DNA binding site is independent of the nucleotides in the remaining positions. Such independent position models are typically represented by position weight matrices (PWMs), which report, for each nucleotide at every position, this nucleotide’s contribution to the total TF binding affinity score (2). Although such models have been very successful, they are known to be nonperfect. In PNAS, Zhou et al. (3) show that information about DNA shape can improve TF-binding models significantly. High-throughput experimental techniques, such as protein-binding microarrays (PBMs) (4) or high-throughput systematic evolution of ligands by exponential enrichment (HTSELEX) (5–8), have provided opportunities for constructing more accurate but necessarily more complex models. As natural extensions of the independent contribution PWM model (see ref. 9 for a recent review), new models often assume that the contribution of any nucleotide at a given position depends not only on the identity of this nucleotide but also on the identity of one or more preceding nucleotides (10). Consequently, the algorithms to build these models often estimate the contribution of dimers, trimers, and generally k-mers to the total binding affinity. Unfortunately, the number of features used in such higher-order models often increases exponentially with k while yielding, as shown by DERAM5 challenge (11), only modest gain. At the same time, large numbers of features become problematic, especially when the experimental data used for training are not so abundant, because it increases the risk of overtraining. Therefore, it is important to zoom in on the most informative of features. Although the complementarity between TFs and their binding sites are obviously dependent on the local features shaping the DNA molecule in 3D (12), such as the major and minor grove surfaces, DNA bending, etc. (Fig. 1), until recently there has not been much effort to include explicitly such DNA features into TF-binding prediction models. It has been assumed that that the specification of the individual bases encapsulates and captures the interaction chemistry implicitly. However, these local shape properties are sequence dependent and thus vary and can be modeled based on the linear DNA sequence information (13). This, in turn, provides the opportunity to expand binding models to include features describing the propensity to adopt a local 3D shape (Fig. 2). Zhou et al. demonstrate that the improvement obtained by extending the independent model by including such shape-describing features is comparable to the improvement gained by including the first-order dependencies (dimers). However, using shape required a significantly smaller number of additional features. The flexibility and thermal dynamics of double-stranded DNA likely ensure that closely related binding sites populate overlapping distributions of structures; to resolve these distributions requires either a theoretically justifiable chemical–structural principle or sufficiently dense data sampling to establish their relative probabilities. DNA shape provides one such principle. The shape vector itself may be considered to constrain implicitly the vectors that partially reflect electrostatics, base stacking, hydration, etc. An important advantage of including DNA shape among predictive features, in addition to reducing the number of features, is its biological interpretability. The propensity of a DNA fragment for a particular shape summarizes the cooperative contribution of the sequence neighborhood to that shape. A natural way for DNA shape to impact TF binding is through the free-energy differences imposed upon the double helix to conform to the shape that facilitates binding. Importantly, high-throughput in vitro measurements can also be used to model binding free energy by relating the probability of binding to interaction energies via Fermi– Dirac distribution (14, 15). Specifically, the probably of TF binding to a nucleotide fragment S, pðSÞ, depends on the binding free energy EðSÞ and the chemical potential μ, which is a function of the TF’s concentration:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transmission Loss Cost Allocation Using Game Theory in Multilateral Transactions of Restructured Electricity Market

The present deregulated electricity industry has evolved into a distributed and competitive industry in which market forces drive the price of electricity and reduce the net cost of electricity. However the competitions in these markets require identification of the use of transmission networks, mainly the participation of utilities in losses caused in the transmission lines. This is because th...

متن کامل

Abstraction and Control for Shapely Nested Graph Transformation

ion and Control for Shapely Nested Graph Transformation

متن کامل

Type Checking and Weak Type Inference for Polynomial Size Analysis of First-Order Functions

We present a size-aware type system for first-order shapely functions. Here, a function is called shapely when the size of the result is determined exactly by a polynomial in the sizes of the arguments. Examples of shapely functions are matrix multiplication and the Cartesian product of two lists. The type checking problem for the type system is shown to be undecidable in general. We define a n...

متن کامل

Monads, shapely functors, and traversals

This paper demonstrates the potential for combining the polytypic and monadic programming styles, by introducing a new kind of combinator, called a traversal. The natural setting for defining traversals is the class of shapely data types. This result reinforces the view that shapely data types form a natural domain for polytypism: they include most of the data types of interest, while to exceed...

متن کامل

Strategic play in stable marriage problem

The stable marriage problem, as addressed by Gale and Shapely [1] consists of providing a bipartite matching between n “boys” and n “girls” each of whom have a totally ordered preference list over the other set such that there exists no “boy” and no “girl” that would prefer each other over their partner in the matching. In this paper, we analyze the cases of strategic play by the “boys” in the ...

متن کامل

Polynomial Size Analysis of First-Order Functions

We present a size-aware type system for first-order shapely function definitions. Here, a function definition is called shapely when the size of the result is determined exactly by a polynomial in the sizes of the arguments. Examples of shapely function definitions may be matrix multiplication and the Cartesian product of two lists. The type checking problem for the type system is shown to be u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Proceedings of the National Academy of Sciences of the United States of America

دوره 112 15  شماره 

صفحات  -

تاریخ انتشار 2015